Machine learning analysis with population data for the associations of preterm birth with temporomandibular disorder and gastrointestinal diseases

This study employs machine learning analysis with population data for the associations of preterm birth (PTB) with temporomandibular disorder (TMD) and gastrointestinal diseases. The source of the population-based retrospective cohort was Korea National Health Insurance claims for 489,893 primiparous women with delivery at the age of 25–40 in 2017. The dependent variable was PTB in 2017. Twenty-one predictors were included, i.e., demographic, socioeconomic, disease and medication information during 2002–2016. Random forest variable importance was derived for finding important predictors of PTB and evaluating its associations with the predictors including TMD and gastroesophageal reflux disease (GERD). Shapley Additive Explanation (SHAP) values were calculated to analyze the directions of these associations. The random forest with oversampling registered a much higher area under the receiver-operating-characteristic curve compared to logistic regression with oversampling, i.e., 79.3% vs. 53.1%. According to random forest variable importance values and rankings, PTB has strong associations with low socioeconomic status, GERD, age, infertility, irritable bowel syndrome, diabetes, TMD, salivary gland disease, hypertension, tricyclic antidepressant and benzodiazepine. In terms of max SHAP values, these associations were positive, e.g., low socioeconomic status (0.29), age (0.21), GERD (0.27) and TMD (0.23). The inclusion of low socioeconomic status, age, GERD or TMD into the random forest will increase the probability of PTB by 0.29, 0.21, 0.27 or 0.23. A cutting-edge approach of explainable artificial intelligence highlights the strong associations of preterm birth with temporomandibular disorder, gastrointestinal diseases and antidepressant medication. Close surveillance is needed for pregnant women regarding these multiple risks at the same time.


Introduction
Preterm birth (PTB) is defined as "birth before 37 weeks of gestation" [1] (abbreviations listed in S1 Table ).It consists of PTB with premature rupture of membranes (PROM), preterm labor and birth without PROM and other PTB.Its frequency varies from 5% to 9% in developed countries but in general it has increased with the growth of indicated PTB and due to other factors, e.g., 9.5% in 1981 to 12.7% in 2005 for the United States [1].Several studies reported that preterm has positive associations with maternal depression and stress [2][3][4][5].Based on a review, preterm birth was a major risk factor for maternal depression, neurodevelopmental delay and childhood disability [2].According to an animal experiment, prenatal maternal stress was a major cause of preterm birth and neonatal immunity in mice [3].The results of the studies above agree with those of two other reviews stating that maternal depression and stress during pregnancy affect preterm birth [4,5].
Likewise, temporomandibular disorder (TMD) and gastrointestinal disease are expected to have positive associations with depression and stress [6][7][8][9][10][11]. TMD can be defined as "disorder affecting joints and muscles" [6].Its symptoms are joint pain, muscle pain and limited mouth opening, whereas its etiological causes are occlusion, trauma, pain stimulus, parafunctional activity and psychological stress [6].A prospective study of female students before the university entrance reported a positive relationship between stress and TMD [7].This positive relationship between stress and TMD was affirmed by a review of 33 original studies [8] and a survey of 112 participants in a general hospital [9].Also, two other reviews highlighted a positive association between depression and gastrointestinal disease such as gastroesophageal reflux disease (GERD) and irritable bowel syndrome [10,11].Based on the findings above, one would expect a positive relationship among PTB, TMD and GERD.But no literature has been available on this topic.In this context, this study employs machine learning analysis with population data for the associations of PTB with TMD and gastrointestinal diseases.

Participants and variables
The source of the population-based retrospective cohort was Korea National Health Insurance claims for 489,893 primiparous women with delivery at the age of 25-40 in 2017.This retrospective cohort study was approved by the Institutional Review Board (IRB) of Korea University Anam Hospital on June 12, 2023 (2020AN0014).Informed consent was waived by the IRB.The data were accessed for the research during July 1, 2023-August 31, 2023.The authors did not have access to information that could identify individual participants during or after data collection.The dependent variables were four categories of PTB (birth before 37 weeks of gestation) in 2017 based on the ICD-10 Code: PTB 1-PTB with PROM only; PTB 2-preterm labor and birth without PROM; PTB 3-PTB 1 or PTB 2; PTB 4-PTB 3 or other indicated PTB (S2 Table ).The 21 independent variables were: (1) two demographic/socioeconomic predictors in 2016, i.e., age, low socioeconomic status with the range of 1 (the highest) to 20 (the lowest) in terms of an insurance fee; (2) five dental diseases for any of the years 2002-2016, including dental cavity, periodontitis, salivary gland disease, tooth loss, TMD; (3) four gastrointestinal diseases for any of the years 2002-2016, i.e., Crohn's disease, GERD, irritable bowel syndrome, ulcerative colitis; (4) four obstetric conditions for any of the years 2002-2016, that is, infertility, hypertension, diabetes, gestational diabetes; (5) six medication predictors in 2016, i.e., benzodiazepine, calcium channel blocker, nitrate, progesterone, sleeping pill, tricyclic antidepressant.These 21 independent variables were selected according to previous studies and data availability.The disease and medication data were screened from ICD-10 and ATC codes, respectively (S2 and S3 Tables).

Machine learning analysis
Logistic regression and the random forest were used for the prediction of PTB.A random forest is a group of decision trees which make majority votes on the dependent variable ("bootstrap aggregation") [12,13].The 489,893 cases with full information were divided into training and validation sets with an 80:20 ratio (391,914 vs. 97,979 cases).The validation criteria were accuracy (a ratio of correct predictions among 97,979 cases) and the area under the receiveroperating-characteristic curve (AUC) (area under the plot of sensitivity vs. 1-specificity).Random forest variable importance was derived for finding important predictors of PTB and evaluating the strengths of its associations with the predictors.Random forest Shapley Additive Explanation (SHAP) values were calculated to analyze the directions of these associations.The permutation importance of a predictor indicates the decrease of model accuracy from data permutation on the predictor.It is an average over all trees with a value of 0 to 1 in the case of the random forest.The SHAP value of a predictor for a participant measures the difference between what machine learning predicts for the probability of PTB with and without the predictor.For example, let us assume that the SHAP values of TMD for PTB have the range of (−0.10, 0.23).Here, some participants have SHAP values as low as −0.10, and other participants have SHAP values as high as 0.23.The inclusion of a predictor (TMD) into machine learning will decrease or increase the probability of the dependent variable (PTB) by the range of −0.10 and 0.23.In other words, there exists a positive association between TMD and PTB in general [12].
In practice, experts in artificial intelligence use random forest permutation importance to derive the rankings and values of all predictors for the prediction of the dependent variable.Then, they employ the SHAP plots to evaluate the directions of associations between the predictors and the dependent variable.Linear or logistic regression used to play this role before the SHAP approach took it over.This is because the SHAP approach has a notable strength compared to linear or logistic regression: the former considers all realistic scenarios, unlike the latter with an unrealistic assumption of ceteris paribus, i.e., "all the other variables staying constant".Let us assume that there are three predictors of PTB, i.e., low socioeconomic status, GERD and TMD.As defined above, the SHAP value of TMD for PTB for a particular participant is the difference between what machine learning predicts for the probability of PTB with and without TMD for the participant.Here, the SHAP value for the participant is the average of the following four scenarios for the participant: (1) low socioeconomic status excluded, GERD excluded; (2) low socioeconomic status included, GERD excluded; (3) low socioeconomic status excluded, GERD included; and (4) low socioeconomic status included, GERD included (13).Finally, it can be noted that R-Studio 1.3.959(R-Studio Inc.: Boston, United States) was employed for the analysis during January 1, 2023-February 28, 2023.
The positive association between PTB and TMD was more apparent in Figs 1-4.Here, points with low TMD values and low SHAP values for PTB were positioned in the left bottom, while points with high TMD values and high SHAP values for PTB were positioned in the right top (Figs 1-4).These figures are called the SHAP dependence plots of TMD vs. PTB.In these figures, the blue (or red) color represents the absence (or presence) of tricyclic antidepressant for a participant, which was found to have the highest correlation with TMD for the prediction of PTB.Here, points with low TMD values, the absence of tricyclic antidepressant and low SHAP values for PTB were positioned in the left bottom, whereas points with high TMD values, the presence of tricyclic antidepressant and high SHAP values for PTB were positioned in the right top.

Summary
Based on random forest variable importance values and rankings, PTB has strong associations with low socioeconomic status, GERD, age, infertility, irritable bowel syndrome, diabetes, TMD, salivary gland disease, hypertension, tricyclic antidepressant and benzodiazepine.Specifically, the positive association among PTB, TMD and tricyclic antidepressant was more apparent in the SHAP dependence plot: points with low TMD values, the absence of tricyclic antidepressant and low SHAP values for PTB were positioned in the left bottom, whereas points with high TMD values, the presence of tricyclic antidepressant and high SHAP values for PTB were positioned in the right top.

Contributions
The results of previous studies based on self-reported questionnaires [14][15][16][17] has been mixed on an association between PTB and TMD.The relationship was not statistically significant between control and experimental adolescents in some examinations [14,15], but positive in another investigation [16].Indeed, a systematic review confirmed the former finding (no significance) [17].The unique contribution of this study is that it used machine learning and population data for confirming the positive association between PTB and TMD.One plausible pathway between PTB and TMD would be vitamin deficiency as in the case of PTB-oral hygiene status [18].Moreover, the findings of explainable artificial intelligence (SHAP) in this study sheds new light on an association among PTB, TMD and antidepressant medication (depression).A systematic review and a prospective cohort study reported that pregnant women experience intense stress and this becomes a significant risk factor for their PTB [19,20].Pregnant women would take antidepressants to relieve their mental burden but this is expected to cause a vicious cycle of furthering the risk of PTB: Three systematic reviews highlighted a positive relationship between antidepressant medication and PTB [21][22][23].As addressed above, in a similar context, a prospective study of female students before the university entrance reported a positive association between stress and TMD [7].This positive association between stress and TMD was affirmed by a review of 33 original studies [8] and a survey of 112 participants in a general hospital [9].This study extends the horizon of existing literature by employing explainable artificial intelligence and population data for identifying an association among PTB, TMD, GERD and antidepressant medication (depression) together.Previous studies highlight behavioral, infectious, neuroendocrine and neuroinflammatory mechanisms between stress and PTB [24].Likewise, one possible pathway between stress and TMD would be the hypothalamic-pituitary-adrenal axis, the serotoninergic and opioid systems [25].These statements can be extended to include GERD.No examination has been done and more investigation is needed in this direction.

Limitations
This study had some limitations.Firstly, this study did not examine etiological differences between preterm birth due to maternal and fetal indication, and preterm birth due to spontaneous preterm labor.Secondly, this study did not investigate possible mediating effects among  independent variables.Thirdly, this study did not use recurrent neural networks (deep learning models) for the limited capacity of the computer server in the Korea National Health Insurance Service data analysis center during the study period.Fourthly, this study did not cover parameter tuning, using the default hyper-parameters of the random forest, i.e., the number of trees 100, the splitting criterion of GINI, the max depth of trees undetermined.

Fig 1 .Fig 2 .
Fig 1. SHAP dependence plot PTB1-3,000 sampling.Legend: The positive association between PTB (preterm birth) and TMD (temporomandibular disease) was more apparent in Figs 1-4.Here, points with low TMD values and low SHAP values for PTB were positioned in the left bottom, while points with high TMD values and high SHAP values for PTB were positioned in the right top.These figures are called the SHAP dependence plots of TMD vs. PTB.In these figures, the blue (or red) color represents the absence (or presence) of TCA (tricyclic antidepressant) for a participant, which was found to have the highest correlation with TMD for the prediction of PTB.Here, points with low TMD values, the absence of TCA and low SHAP values for PTB were positioned in the left bottom, whereas points with high TMD values, the presence of TCA and high SHAP values for PTB were positioned in the right top.https://doi.org/10.1371/journal.pone.0296329.g001